{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.sys.path.append(os.path.dirname(os.path.abspath('.')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from datasets.dataset import load_boston\n",
    "from model_selection.train_test_split import train_test_split\n",
    "\n",
    "data=load_boston()\n",
    "X=data.data\n",
    "Y=data.target\n",
    "\n",
    "X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2)\n",
    "\n",
    "# 为便于numpy矩阵乘法之间的维度维护，统一将Y转换成列向量，与一行一个样本对应\n",
    "Y_train=Y_train.reshape((-1,1))\n",
    "Y_test=Y_test.reshape((-1,1))\n",
    "\n",
    "# print(X_train.shape,Y_train.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据$X$是一个$(n{\\times}m)$的矩阵，每一行是一个样本，每一列代表一个特征："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_sample=X_train.shape[0]\n",
    "n_feature=X_train.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 粗略模型\n",
    "\n",
    "模型表达式为：\n",
    "$$\n",
    "\\hat{y}=wx+b\n",
    "$$\n",
    "在向量化操作时，其中的$y$、$w$、$x$都会被矩阵形式替代。为了便于与后期深度学习的一致性，设$X$的维度为$(n\\_sample,n\\_feature)$，设$W$的维度为$(n\\_feature,n\\_output)$，偏置系数$b$为单变量系数。模型可以写成：\n",
    "$$\n",
    "\\hat{Y}=XW+b\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "W = np.random.randn(n_feature).reshape((n_feature, 1))  # 权重，列向量\n",
    "b = 1  # 偏置\n",
    "\n",
    "Y_hat=np.dot(X_train, W)+b\n",
    "\n",
    "# print(W.shape,Y_hat.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型的损失函数为：\n",
    "$$\n",
    "\\begin{align}\n",
    "L&=\\sum\\limits_{i=1}^n(y^{(i)}-\\hat{y}^{(i)})^{2} \\\\\n",
    "&=\\frac{1}{n}(Y-\\hat{Y})^{T}(Y-\\hat{Y}) \\\\\n",
    "\\end{align}\n",
    "$$\n",
    "损失函数关于参数$W$与$b$的梯度可以求得：\n",
    "$$\n",
    "\\begin{align}\n",
    "\\frac{\\partial{L}}{\\partial{W}}&=\\frac{2}{n}[X^{T}(\\hat{Y}-Y)] \\\\\n",
    "\\frac{\\partial{L}}{\\partial{b}}&=\\frac{2}{n}[1,1,...,1]{\\cdot}(\\hat{Y}-Y) \\\\\n",
    "\\end{align}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "dW = 2 * X_train.T.dot(Y_hat - Y_train) / n_sample\n",
    "db = 2 * np.sum(Y_hat - Y_train)/ n_sample\n",
    "\n",
    "# print((Y_hat - Y_train).shape,dW.shape,db.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参数的迭代更新公式：\n",
    "$$\n",
    "W:=W-{\\alpha}\\frac{\\partial{L}}{\\partial{W}}, \\quad b:b-{\\alpha}\\frac{\\partial{L}}{\\partial{b}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "max_iter=2000\n",
    "alpha=0.000001        # 注意学习率过大会导致震荡，然后误差越来越大\n",
    "\n",
    "for i in range(max_iter):\n",
    "    Y_hat=np.dot(X_train, W)+b\n",
    "    \n",
    "    dW = 2 * X_train.T.dot(Y_hat - Y_train) / n_sample\n",
    "    db = 2 * np.sum(Y_hat - Y_train)/ n_sample\n",
    "    \n",
    "    W = W - alpha * dW\n",
    "    b = b - alpha * db"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用该模型分别对训练集与预测集做预测："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_pred_train=np.dot(X_train, W) + b\n",
    "Y_pred_test=np.dot(X_test, W) + b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义一个RMSE损失函数来评价模型的表现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.6449138841214859 1.3965156320353282\n"
     ]
    }
   ],
   "source": [
    "def RMSE(Y_true,Y_pred):\n",
    "    return np.sum((Y_true-Y_pred)**2)**0.5/len(Y_true)\n",
    "\n",
    "print(RMSE(Y_train,Y_pred_train),RMSE(Y_test,Y_pred_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型简单打包："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10.421461312004459\t1.5753299230529596\t1.3010925384326066\t1.1150379318323123\t0.9900455051928722\t0.9059846965894982\t0.8486191309906469\t0.8083911440623579\t0.7791472575454574\t0.7570256866554005\t"
     ]
    }
   ],
   "source": [
    "def linear_reg(X,Y,alpha=0.000001,max_iter=2000):\n",
    "    Y=Y.reshape((-1,1))\n",
    "    \n",
    "    n_sample=X.shape[0]\n",
    "    n_feature=X.shape[1]\n",
    "    \n",
    "    W = np.random.randn(n_feature).reshape((n_feature,1))  # 权重\n",
    "    b = 1  # 偏置\n",
    "    \n",
    "    for i in range(max_iter):\n",
    "        Y_hat=np.dot(X, W)+b\n",
    "\n",
    "        dW = 2 * X.T.dot(Y_hat - Y) / n_sample\n",
    "        db = 2 * np.sum(Y_hat - Y) / n_sample\n",
    "\n",
    "        W = W - alpha * dW\n",
    "        b = b - alpha * db\n",
    "        \n",
    "        if i%200==0:\n",
    "            Y_hat=np.dot(X, W)+b\n",
    "            L=np.sum((Y-Y_hat)**2)**0.5/n_sample\n",
    "            print(L,end='\\t')\n",
    "\n",
    "    return W,b\n",
    "\n",
    "W,b=linear_reg(X_train,Y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据归一化\n",
    "**Normalization：**\n",
    "$$\n",
    "x=\\frac{x-x_{min}}{x_{max}-x_{min}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "X=np.row_stack((X_train,X_test))\n",
    "\n",
    "X_max=X.max(axis=0)\n",
    "X_min=X.min(axis=0)\n",
    "\n",
    "X_train_norm=(X_train-X_min)/(X_max-X_min)\n",
    "X_test_norm=(X_test-X_min)/(X_max-X_min)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对数据归一化之后再测试模型表现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.1866179843366065\t1.1851727557939387\t1.1837300851998311\t1.1822899688901225\t1.1808524032054784\t1.1794173844913827\t1.1779849090981265\t1.1765549733808003\t1.1751275736992812\t1.1737027064182264\t"
     ]
    }
   ],
   "source": [
    "W,b=linear_reg(X_train_norm,Y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为数据做了归一化，整个数据集上的梯度分布得到了改良，所以可以调大学习率，由此可以看出数据标准化在线性回归上的威力："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5898723758342488\t0.25669946297613266\t0.24193280206822776\t0.23824895656765277\t0.23642904227502215\t0.23526814606885943\t0.2344731715988042\t0.23391549904818665\t0.23351978554054928\t0.23323705925155527\t"
     ]
    }
   ],
   "source": [
    "W,b=linear_reg(X_train_norm,Y_train,alpha=0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Standardization：**\n",
    "$$\n",
    "x=\\frac{x-x_{\\mu}}{\\sigma}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "X=np.row_stack((X_train,X_test))\n",
    "\n",
    "X_avg=X.mean(axis=0)\n",
    "X_std=X.std(axis=0)\n",
    "\n",
    "X_train_std=(X_train-X_avg)/X_std\n",
    "X_test_std=(X_test-X_avg)/X_std"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.1533641928430791\t1.1528875544037156\t1.1524112302061575\t1.1519352196779988\t1.151459522248928\t1.150984137350719\t1.150509064417222\t1.1500343028843558\t1.1495598521900992\t1.1490857117744822\t"
     ]
    }
   ],
   "source": [
    "W,b=linear_reg(X_train_std,Y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9520552989675496\t0.23251662877347026\t0.2325058668412472\t0.23250581106415466\t0.23250581077482452\t0.2325058107733237\t0.23250581077331586\t0.23250581077331586\t0.23250581077331586\t0.23250581077331586\t"
     ]
    }
   ],
   "source": [
    "W,b=linear_reg(X_train_std,Y_train,alpha=0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这个简单实验比较发现，相比于Normalization，Standardization能够更快地加速模型的收敛，这跟最小二乘法对于数据先验分布为正态分布的假设是一致的。\n",
    "\n",
    "数据归一化工具简单打包："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Standardization(X_train,X_test):\n",
    "    X=np.row_stack((X_train,X_test))\n",
    "\n",
    "    X_avg=X.mean(axis=0)\n",
    "    X_std=X.std(axis=0)\n",
    "\n",
    "    X_train_std=(X_train-X_avg)/X_std\n",
    "    X_test_std=(X_test-X_avg)/X_std\n",
    "    return X_train_std,X_test_std"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## mini-batch梯度下降"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.474682059274926\t0.5356777959141976\t0.47384792900798556\t0.4391595004502859\t0.4132333630693115\t0.3934206203125686\t0.3781957553275647\t0.3664223754397032\t0.35724210453440275\t0.3500104774937606\t"
     ]
    }
   ],
   "source": [
    "def linear_reg(X,Y,alpha=0.000001,max_iter=2000,batch_size=32):\n",
    "    Y=Y.reshape((-1,1))\n",
    "    \n",
    "    n=X.shape[0]\n",
    "    m=X.shape[1]\n",
    "    num_batch = n // batch_size\n",
    "    \n",
    "    W = np.random.rand(m).reshape((m, 1))  # 权重\n",
    "    b = 1  # 偏置\n",
    "\n",
    "    for epoch in range(max_iter):\n",
    "        \n",
    "        ######  mini-batch  ######\n",
    "        for i in range(num_batch + 1):    # 有可能有多余的不完整batch，多循环一次\n",
    "            start_index = i * batch_size\n",
    "            end_index = (i + 1) * batch_size\n",
    "            \n",
    "            if start_index < n:\n",
    "                # 切片操作不会引发越界\n",
    "                X_batch = X[start_index:end_index + 1]\n",
    "                Y_batch = Y[start_index:end_index + 1]\n",
    "                \n",
    "            n_batch=X_batch.shape[0]\n",
    "            Y_hat_batch=np.dot(X_batch, W)+b\n",
    "\n",
    "            dW = 2 * X_batch.T.dot(Y_hat_batch - Y_batch) / n_batch\n",
    "            db = 2 * np.sum(Y_hat_batch - Y_batch) / n_batch\n",
    "        \n",
    "        \n",
    "            W -= alpha * dW\n",
    "            b -= alpha * db\n",
    "        ######  mini-batch  ######\n",
    "        \n",
    "        if epoch%200==0:\n",
    "            Y_hat=np.dot(X, W)+b\n",
    "            L=np.sum((Y-Y_hat)**2)**0.5/n_sample\n",
    "            print(L,end='\\t')\n",
    "\n",
    "    return W,b\n",
    "\n",
    "W,b=linear_reg(X_train,Y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "单纯的mini-batch并没有很明显的提升模型表现，我们再加上Standardization："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.2035063734435572\t1.1947475885014098\t1.1861772410107605\t1.1777872195520669\t1.169569794197096\t1.1615176004647043\t1.1536236237540518\t1.1458811842592154\t1.1382839223671453\t1.130825784539052\t"
     ]
    }
   ],
   "source": [
    "X_train_std,X_test_std=Standardization(X_train,X_test)\n",
    "W,b=linear_reg(X_train_std,Y_train,batch_size=32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "额，表现并不是很理想，甚至在加大max_iter值后模型还是没有收敛，可能是数据量太小，mini-batch引入的随机性对模型的收敛起了一个反作用。"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
